Enhancing Data-Driven Phone Confusions Using Restricted Recognition
نویسندگان
چکیده
This paper presents a novel approach to address data sparseness in standard confusion matrices and demonstrates how enhanced matrices, which capture additional similarities, can impact the performance of spoken term detection. Using the same training data as for the standard phone confusion matrix, an enhanced confusion matrix is created by iteratively restricting the recognition process to exclude one acoustic model per iteration. Since this results in a greater amount of confusion data for each phone, the enhanced confusion matrix encodes more similarities. The enhanced phone confusion matrices perform demonstrably better than standard confusion matrices on a spoken term detection task which uses both HMMs and DNNs.
منابع مشابه
Automatic Phone Clustering Based on Confusion Matrices
Phone recognition experiments give information about the confusions between phones. Grouping the most confusable phones and making a multilevel hierarchical classification should improve phone recognition. In this paper a clustering method is investigated, based on phone confusion matrix, for the data-driven generation of phonetic broad classes (PBC) of the Portuguese language. The method is ba...
متن کاملAcoustic and phonetic confusions in accented speech recognition
Accented speech recognition is more challenging than standard speech recognition due to the effects of phonetic and acoustic confusions. Phonetic confusion in accented speech occurs when an expected phone is pronounced as a different one, which leads to erroneous recognition. Acoustic confusion occurs when the pronounced phone is found to lie acoustically between two baseform models and can be ...
متن کاملOn Confusions in a Phoneme Recognizer
In this paper, we analyze the confusions patterns at three places in the hybrid phoneme recognition system. The confusions are analyzed at the pronunciation, the posterior probability, and the phoneme recognizer levels. The confusions show significant structure that is similar at all levels. Some confusions also correlate with human psychoacoustic experiments in white masking noise. These struc...
متن کاملSpeech Recognition of Non-native Speech Using Native and Non-native Acoustic Models
A speech recognition system is subjected to the speech of non-native speakers, using both native and non-native acoustic phone models. The problems involved with the mapping of phoneset from the nonnative to native language are investigated, and a detailed analysis of phone confusions is made. For Dutch speakers, British English acoustic models give the best word recognition results.
متن کاملEnhancing Learning from Imbalanced Classes via Data Preprocessing: A Data-Driven Application in Metabolomics Data Mining
This paper presents a data mining application in metabolomics. It aims at building an enhanced machine learning classifier that can be used for diagnosing cachexia syndrome and identifying its involved biomarkers. To achieve this goal, a data-driven analysis is carried out using a public dataset consisting of 1H-NMR metabolite profile. This dataset suffers from the problem of imbalanced classes...
متن کامل